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MODEL SELECTION FOR QUANTUM HOMODYNE TOMOGRAPHY 



^ ! Jonas Kahn^ 
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Abstract. This paper deals with a non-parametric problem coming from physics, namely quantum 
tomography. That consists in determining the quantum state of a mode of light through a homodyne 
measurement. We apply several model selection procedures: penalized projection estimators, where 
. we may use pattern functions or wavelets, and penalized maximum likelihood estimators. In all these 

cases, we get oracle inequalities. In the former we also have a polynomial rate of convergence for 
(-H ^ the non-parametric problem. We finish the paper with applications of similar ideas to the calibration 

of a photocounter, a measurement apparatus counting the number of photons in a beam. Here the 
mathematical problem reduces similarly to a non-parametric missing data problem. We again get 
oracle inequalities, and better speed if the photocounter is good. 



Resume. Nous nous interessons a un probleme de statistique non-parametrique issu de la physique, 
^ ' et plus precisement a la tomographie quantique, c'est-a-dire la determination de I'etat quantique d'un 

^SJ \ mode de la lumiere via une mesure homodyne. Nous appliquons plusieurs procedures de selection de 

modules: des estimateurs par projection penalises, oil on pent utiliser soit des fonctions motif, soit des 
ondelettes, et I'estimateur du maximum de vraisemblance penalise. Dans chaque cas, nous obtenons 
(N ■ une inegalite oracle. Nous prouvons egalement une vitesse de convergence polynomiale pour ce prob- 

^SJ ' leme non-parametrique, pour les estimateurs par projection. Nous appliquons ensuite des idees a la 

calibration d'un photocompteur, I'appareil denombrant le nombre de photons dans un rayon lumineux. 
, Le probleme mathematique se reduit dans ce cas a un probleme non-parametrique a donnees man- 

' quantes. Nous obtenons a nouveau des inegalites oracle, qui nous assurent des vitesses de convergence 

d'autant meilleures que le photocompteur est bon. 
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1. Introduction 

Quantum mechanics introduces intrinsic randomness in physics: the result of a measurement, or any macroscopic 
interaction, on a physical system is not deterministic. Therefore, a host of statistical problems can stem 
from it. Some are (almost) specifically quantum, notably any question about which measurement yields the 
maximum information, or whether simultaneously measuring n samples is more efficient than measuring them 
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sequentially [10]. However, once we have chosen the measurement we carry out on our physical system, we 
are left with an entirely classical statistical problem. This paper aims at applying model selection methods a 
la Birge-Massart to one such instance, which is of interest both practical, as physicists use this measurement 
quite often (the underlying physical system is elementary; it is the particle with one degree of freedom), and 
mathematical, as it yields a nonparametric inverse problem with uncommon features. 

Moreover, as this classical problem stemming from quantum mechanics could be seen as an easy introduction 
to the subject to classical statisticians, we have added more general notions on quantum statistics at the 
beginning of the appendix. The interested reader can get further acquaintance with these concepts through the 
textbooks [11] and [12] or the review article [2]. 

More precisely, the problem we are interested in is quantum homodyne tomography. As an aside, we apply 
the results we get to the calibration of a photocounter, using a quantum tomographer as a tool. The word 
"Homodyne" refers to the experimental technique used for this measurement, first implemented in [17], where 
the state of one mode of electromagnetic radiation, that is a pulse of laser light at a given frequency, is probed 
using a reference laser beam at the same ("homo") frequency. And "Tomography" is used because one of the 
physicists' favourite representations of the state, the Wigner function, can be recovered from the data by 
inverting a Radon transform. 

Mathematically, our data are samples from a probability distribution Pp on R x [0, tt]. From this data, we want 
to recover the "density operator" p of the system. This is the most common representation of the state, that 
is a mathematical object which encodes all the information about the system. Perfect knowledge of the state 
means knowing how the system will evolve and the probability distribution of the result of any measurement 
we might carry out on the system. And these laws of evolution and measurement can be expressed naturally 
enough within the density operator framework (see Appendix) . The density operator is a non-negative trace-one 
self-adjoint operator p on L^(M) (in our particular case). We know the linear transform T which takes p to Pp 
and can make it explicit in particular bases such as the Fock basis. We may also settle for the Wigner function 
W, another representation of the state. That is a two-dimensional real function with integral one, and pp is the 
Radon transform of W. 

The first reconstruction methods used the Wigner function as an intermediate representation: after collecting the 
data in histograms and smoothing, one inverted the Radon transform to get an estimate of W. This smoothing, 
however, introduces hard-to-control bias. Using the pattern functions (bidual bases, in fact) introduced in [6], 
consistency of linear estimators of the density operator was proved in [1]. There were also similar results for 
sieved maximum likelihood estimators. Then, a sharp adaptive estimator for the Wigner function was devised 
in [3] , and this even if there is noise in the measurement (see subsection 13. 6p . 

In this paper, we devise penalized estimators that fulfill oracle-type inequalities among the L^-projections on 
submodels, analyze the penalized maximum likelihood estimator and apply these estimators to the calibration 
of a photocounter. Notice that all these results are derived for finite samples (all the previous works considered 
only the asymptotic regime). We have mainly worked under the idealized hypothesis where there is no noise, 
however. 

The appendix is not logically necessary for the article. We have inserted it for background and as an invitation 
to this field. It first features a general introduction to quantum statistics with a public of classical statisticians 
in mind. We then describe what quantum homodyne tomography precisely is. This latter subsection is largely 
based on [3]. 

Section[2]formalizes the statistical problem at hand, with no need of the appendix, except the equations explicitly 
referred to therein. 

Section [3] aims at devising a model selection procedure to choose between L^-projection estimators. We first 
give general theorems (|3.2l and l3.4p leading to oracle-type inequalities for hard-thresholding estimators. We then 
apply them to two bases. One is the Fock basis and the corresponding pattern functions physicists have used 
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for a while. For it we also prove a polynomial convergence rate for any state with finite energy. The other is a 
wavelet basis for the Wigner function. We finish with a short subsection describing what changes are entailed 
by the presence of noise. Especially, we do not need to adapt our theorems if the noise is low enough, as long 
as we change the dual basis. 

Section |4] similarly applies a classical theorem l|4.2p to solve the question of which (size of) model is best to use 
a maximum likelihood estimator on. 

Section \5\ switches to the determination of a kind of measurement apparatus (and not any more on the state 
that is sent in) using a known state and this same tomographer that was studied in the previous sections. The 
law of our samples are then very similar and we apply the same type of techniques (penalized projection and 
maximum Hkelihood estimators). The fact that the POVM (mathematical modelling of a measurement) is a 
projective measurement (see Appendix) enables us to work with i^-operator norm, however. 



2. The mathematical problem 



We now describe the mathematical problem at hand. 

We are given n independent identically distributed random variables Yi = (AT,, (/)i) with density Pp on [0, tt) x M. 

This data is the result of a measurement on a physical system. Now the "state" of a system is described by a 
mathematical object, and there are two favourites for physical reasons: one is the density operator p, the other 
is the Wigner function Wp. We describe them below. 

Therefore we are not actually interested in pp, but rather in Wp or (maybe preferably) p. The probability 
distribution pp of our samples can be retrieved if we know either p or Wp. 

In other words we aim at estimating as precisely as possible p or Wp from the data {Yi}. By " as precisely as 
possible", we mean that with a suitable notion of distance, we shall minimize E [d{p, p)]. Our choice of distance 
will be partly dictated by mathematical tractability. 

We now briefiy explain what Wp and p stand for. 

The Wigner function Wp : ^ K is the inverse Radon transform of pp. In fact we would rather say that pp is 
the Radon transform of Wp. Explicitly: 

/oo 
W{x COS (j) + y sin (j), x sin (j) — y cos <j))dy. 
-oo 

Figure [1] might be of some help. An important remark is that the Wigner function is not a probability density, 




Figure 1. The value of Pp at {x, 4>) is the integral of the Wigner function over the bold line 
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but only a quasi-probability density: a function with integral 1, but that may be negative at places. However 
its Radon transform is a true probability density, as it is Pp. 

Retrieving Wp from Pp then amounts to inverting the Radon transform, hence the name of tomography: that 
is the same mathematical problem as with the brain imagery technique called Positron Emission Tomography. 

As for p, this is a density operator on the Hilbert space i^(R), that is a self-adjoint positive operator with trace 
1. We denote the set of such operators by iS(i^(]R)). There is a Hnear transform T that takes p to pp. We 
give it explicitly using a basis of L^(R) known as the Fock basis This orthonormal basis, which has many nice 
physical properties, is defined by: 

M^) = fffc(x)e-"'/2 (1) 

where Hk is the fcth Hermite polynomial normalized such that ||V'fc|l2 ^ 1- The matrix entries of p in this basis 
are pj,k = {''Pj, pipk)- Then T can be written: 

T : S{L^{R)) — 

p ^ 



Notice that as we have defined precisely the set of possible p, this mapping yields the set of possible pp and Wp. 
The relations between p, Wp and Pp are further detailed in subsection I A. 21 

Anyhow we may now state our problem as consisting in inverting either the Radon transform or T from empirical 
data. 

This is a classical problem of non-parametric statistics, that we want to treat non-asymptotically. We then take 
estimators based on a model, that is a subset of the operators on (R) , or equivalently of the two-dimensional 
real functions. These models are usually vector spaces, which may not be the domain of the object to be 
estimated. To choose a candidate within a given model, there are different methods, two of which we study, 
projection estimators and maximum Hkelihood estimators. Once we have a candidate within each model, we 
then use model selection methods to choose (almost) the best. 

We first study projection estimators, for which the most convenient distance comes from the norm 

ikii2 = v/ESwp-./E^' 

V 

where the Xi are the eigenvalues of r, and the second equality holds for r written in any orthonormal basis. 
Notice that there is an isometry (up to a constant) between the space of density operators with i^-operator 
norm and the space of Wigner functions with L^-Lebesgue norm, that is: 

\\Wp-Wr\\l^ J I \Wp{q,p)^Wriq,p)\'dpdq^^\\p-T\\l 

For maximum likelihood estimators, we have to make do with the weaker Hellinger distance (see later ((23l) ) on 
L^(]R X [0,7r]), to which Pp belongs. 



L\R X [0,7r]) 



Pp : {x,(j>) pj^kipj{x)ipkis 
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3. Projection estimators 



In this section, which owes much to [16], we apply penaUzation procedures to projection estimators. The first 
subsection explains that we want to obtain oracle-type inequalities. In the second we obtain a general inequality 
where the left-hand side corresponds to an oracle inequality, and where the remainder term in the right-hand 
side depends on the penalty and on the large deviations of empirical coefficients. The two following subsections 
give two ways to choose the penalty term large enough for this remainder term to be small enough. In section [331 
this penalty is deterministic. We design it and prove that it is a "good choice" by keeping Hoeffding's inequality 
in mind. In section [331 the penalty is random, and designed by taking Bernstein's inequality into account. 

We next express these theorems in terms of two specific bases. For the Fock basis, we obtain polynomial worst- 
case convergence rates, using the structure of states. For a wavelet basis, we notice we obtain a usual estimator 
in classical tomography. We finish by saying what can be done if there is noise, that is (mainly) convolution of 
the law of the sample by a gaussian. We multiply the Fourier transform of the dual basis with the inverse of 
the Fourier transform of the gaussian, and as long as we still have well-defined functions, and we can re-use our 
theorems without changes. 



3.1. Aim of model selection 



Let's assume we are given a (countable) L^-basis (ei)igx of a space in which 5(L^(M)) is included (typically 
T(L^(R)), the trace-class operators on L^(R)). We may then try and find the coefficients of p in this basis. 
The natural way to do so is to find a dual basis (/i)iGi such that (T(ei),/j) = 5ij for all i and j. Then, if 
P = Y^i Pie-i we get {pp, fi) = pi for all i. And if the are well enough behaved, then ^ I]fe=i fii^kAk) = Pi 
tends to pi by the law of large numbers. 

Now if we took ^ ■ piCi as an estimator of p, we would have an infinite risk as the variance would be infinite. 
We must therefore restrict ourselves to models to € A^, that is Vect (ei,i G to), where to is a finite set, and M. 
is a set of models (we might take M. smaller than the set of all finite sets of N) . 

We may then write the loss as 

llPm - pf = \pi? + ^\P^- Pi? 

where the first term is a bias (modelling error) and the second term is an estimation error. The risk would have 
this expression: 

i^ni i^m 

where the expectation is taken with respect to Pp, since pi depends on the {Xk,(j>k)- 

If we use an arbitrary model to, we probably have not have struck a good balance between the bias term and the 
variance term. The whole point of penalisation is to have a data-driven procedure to choose the "best" model. 
We are aiming at choosing the model with (almost) the lowest error. We would dream of obtaining: 

TO = arg inf ||/5,„ - p\? . 



That is of course too ambitious. Instead, we shall obtain the following kind of bound, called an oracle inequality: 



E 



C inf ((fip, to) + pcn(TO)) U V 



(2) 



where (P{p, to) is the bias of the model to, C is some constant, independent of p, pen{m) is a penalty associated 
to the model to (the bigger the model, the bigger the penalty) and e„ depends only on n the number of 
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observations, and goes to when n is going to infinity. We shall try to take the penalty of the order of the 
variance of the model. 

Notice that we have given in ^ an unusual form of oracle inequality. These inequalities are more often written 
as 



E 



< 



C inf ((f{p,m) + E[p cn m + Cri- 



Our form implies the latter. 
The strategy is the following: 

First, rewrite the projection estimators as minimum contrast estimators^ that is minimizers of a function (called 
the empirical contrast function, and written 7„), which is the same for all models. We also demand that, for 
any m, this empirical contrast function converges to a contrast function 7, the minimizer in m of which is the 
projection of p on to. 

Second, find a penalty function that overestimates with high enough probability (7 — ^n){pm) for all to simul- 
taneously. Use of concentration inequalities is pivotal at this point. 

The next section makes all this more explicit. 

3.2. Risk bounds and choice of the penalty function 

First we notice that the minimum of 

7(r) = ||r||'-2(r,p) 
= \\p-r\?-\\p\? 

over a model m is attained at the projection of p on m. Moreover 

1 " 

7„ (r) = II rf - 2 ^ - ^ rj, (Xfc , 0,) 

i k=l 

converges in probability to 7 for any to (and all r such that ||r|| = 1 simultaneously), as there is only a finite 
set of i such that 7^ for r S to. 

Now the minimum of 7„ over to is attained by 

1 " 

i^m k—1 

So we have succeeded in writing projection estimators as minimum contrast estimators. We then define our 
final estimator by: 



with 



/5("' - P, 



to = arg min 7„(pm) + pcn„(TO) 



where pen„ is a suitably chosen function depending on n, to and possibly the data. 
We then get, for any to, for any S to, 

7«(P^"'') + Pcn„(TO) < 7„(pm) + pcn„(TO) < 7„(t™) + pcn„(TO). (3) 
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What's more, for any m, for any € m, 



(4) 



with 



'^nij) = {t,p) -^^Tifi{Xk,<l>k) 



i k=l 



Putting together ([3]) and (|4]), we get, for all m and £ m: 



- p < ||Tm - pII^ + 2t/„(/5("' - r„i) + pcn„(m) - pcn„(m). 



We then want to take penalties big enough to dominate the fluctuations Vn- Some manipulations will make this 
expression more tractable. First we bound i/„(/5'-"-* — Tm) by lip''"'' — Tm 11 Xnim U rh), with 



X„(m) = sup iy„{T). 



\\r\\ = l 



Now the triangle inequality gives ||/5^"-' — r,„ || < llp^"-' ~ p\\ + ilP ^ '''nill, so that: 



p^"^^ - p < \\p-Trn\\^ + 2xn{mUm) p - p^"^'' + 2x„ (m U m) 1 1 p - r,„ j | - pcn„ (to) + pen„ (to) 



For all a > 0, the following holds: 



2ah < aa^ + a ^6 



U2 



(5) 



Using this twice, we get, for all e > 0: 



2 + e 



p-p(") 



< ( 1 + - ) Hp - T,„|| + (1 + e)x„(7nU m) - pen„(m) + pen„(TO) 



Noticing that Xn{iTi U to) < Xn(™) + Xn(^) and putting our estimate of the error in the left-hand side: 



2 + e 



P - p(") 



l + -)||p-T„J +2pen(TO)f- < (l + e)(x„(TO) + x„(TO))-pen„(TO)-pen„(m). 
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Now what we want to avoid is that our penalty is less than the fluctuations, so we separate this event and take 
its expectation: 



E 



2 + e 



P- P 



(n) 



1 + ^) ||/9-r,„||V2pcn„(m) ) }V0 



< E [{(1 + e)(x^(m) + xlM) - pcn(m) - pcn{m)} V O] 



< 2E 



sup {(1 + e)xl{m) - pen(?7i)} V 



(6) 



Thus stated, our problem is to take a penalty large enough to make the right-hand side negligible, that is 
vanishing like 1/n. 

We shall use this form of Xni'm)- 



so that 



X„(to) = sup ^^Tiip^-pi) = J'^lPi-Pi 



1 " 

Pi y]/i(2^fe>fc) 

n ^ — ^ 



(7) 



3.3. Deterministic penalty 

First we may try to craft a deterministic penalty. 

We plan to use Hoeffding's inequality, recalling that pi is a sum of independent variables: 

Lemma 3.1. : Hoeffding's inequality Let Xi, . . . ,X„ he independent random variables, such that Xi takes 
his values in [a^, bi] almost surely for all i < n. Then for any positive x, 



^(X,-E[X,]) > 



4=1 



< 



exp 



We may also apply this inequality to —Xi so as to get a very probable lower bound on the sum of Xi. 
This is enough to prove: 

Theorem 3.2. Let p be a density operator. Assume that each fi is bounded, where (/i)i6i is the dual basis of 
(ej)jei, as defined at the beginning of this section. Let Mi = sup(^ 0)gRx[o,7r] Mxi<l>) - inf( 
Let {xi)i(zj he a family of positive real numbers such that X^iei cxp(— x;) ~ a < oo. Let 



Then the penalized projection estimator satisfies 

2 



E 



2 + e 



- p 



< inf ( 1 + - I d'^{p, m) + 2pcn„(m) + iU-fl^. 
meM \ e I n 



(8) 



(9) 
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Remark: Here the penalty depends only on the subspace spanned by the model m. So it is the same whether 
M is small or large. The best we can do is then to take ^A = 'P{2), that is to choose for every vector whether 
to keep the estimated coordinate Pi or to put it to zero. In other words we get a hard-thresholding estimator: 



y^P»l|Pi|>aiei 



with 



a,; = ,/(l + e)(ln(Af,) + ^)^ 



(10) 



Proof. Considering ([6]), we have only to bound appropriately E [sup„ ((1 + e)x^(»Ti) — pcn(m)) V O] . 

Now, by ([7]) and (HI, both Xn("^) and pen,„ are a sum of terms over m. As the positive part of a sum is smaller 
than the sum of the positive parts, we obtain: 



E 



sup {(1 + e)x«("i) - pcn(m)} V 



< E 



E f^i^k,4>k) - p}j - (1 + e) (ln(A/,0 



(1 + ^) 



Each of the expectations is evaluated using the following formula, valid for any positive function /: 



E [/] = / P [/(x) > y] dy. 
Jo 



(11) 



Remembering flO|) we notice that the inequality 



(1 + ^) E /^(^fc' M - - (1 + e) (ln(MO 



Xi\ Mf , 



is equivalent to 



1 " 

n ^ — ' 



> 



k=l 



1 + e ■ 
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We may then conclude, using Hoeffding's inequality on the second line and the value lfTO|) of Ui on the fourth 
line: 



E 



sup {(1 + e)x^(TO) - pen(?7i)} V 



< 



E 



1 " 

- y] fi{xk,<pk) - Pi 

71 ^ 



k=l 



> 



1 + e 



dy 



2n{al + v) 



Y]exp(-a;i) 
(l + 6)a 



2na| \ (l + e)i\/f 



2n 



□ 



3.4. Random penalty 

The most obvious way to improve on Theorem 13.21 is to use sharper inequalities than Hoeffding's. Indeed 
the range of /i might be much larger than its standard deviation, so that we gain much by using Bernstein's 
inequality: 

Lemma 3.3. : Bernstein's inequality LetX\,...,Xn he independent, bounded, random variables. Then 
with 



M = sup||Xi 



for any positive x 



M 



< exp(— x). 



With this tool, we may devise a hard-thresholding estimator where the thresholds are data-dependent: 
Theorem 3.4. Let {yi)iex be positive numbers such that J2iei^^^' — Let then 

X, = 2lni\\f,\\J+y,. 

Let the penalty be a sum of penalties over the vectors we admit in the model. That is, for any S € (0, 1), for 
any i <e2, define 



1 + e 



pen„ = 



(12) 
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and the penalty of the model m: 



pcn„(TO) 



pen; 



Then there is a constant C such that: 

2 



E 



2 + e 



inf 1 + - U^(p,m)+2pen„(TO) VO 
meM„ \ e 



< 



Ca 



where A4n is the set of models m for which i E m Xi < n. 

Remark: As with the deterministic penalty, we end up with a hard-thresholding estimator. Morally, that is, 
forgetting all the small 5 whose origin is technical, the threshold is 



'2P„ [/f]ln||/, 



Proof. Once again we have to dominate the right-hand side of ([6]). We first subtract and add, inside that 
expression, what could be seen as a target for the penalty. Writing 



M. = ll/.|loo> 



[ff] , 



we have 



E 



sup ((1 + e)x^(™) — pcn(m)) V 



< E 



Using |[7|, we bound the first term as a sum of expectations. 



E 



sup(l + e) ( xl{m) - A"? ) V 



< (1 + 6) 5]E 



M, 



sup(l + e) (x'(m)- Vija? ) VO +E ( V - pen(j7i) ) V 

\ i^rn / J L \iGm / 



(13) 



(14) 



1 " 

Pz — y^^fiixk,^ 



fc = l 



-af I VO 



We now bound each of these expectations using ifTTj) . 

r/ 1 " ' 

^ y]/»(a;fc,0fc) 

y fc=i 

We change variables in the integral, choosing ^ defined by: 



■Jo [ 



> 



dy. (15) 



(16) 
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Using Bernstein's inequality, the integrand in (fT5| is upper bounded by 2exp(— ^). Given the value of Ui (fT3)) . 
the range of the integral is now from Xi to oo. Finally, taking the square on both sides of (fT6| . then using ([5]), 
we get: 



dy 



< 



3 2V? 



2 / Af? M, , ^ 

-^\^^^ + ^y2^V^ ) 

2 



9 " 2 

11 M2 \ 
2.. + ^^ de 



Hence 



E 



1 " 

-y]/«(a;fe,</'fc) 

rj ^ — ^ 



-^af I V 



4 /"^ / 11 A/2 \ 

4 / llA//? , A 
= — ( 2t;, + ^^(l + Xj) j exp(-xO. 



(17) 



Let us now look over the second term of (fl4|). We notice, through lfT2|) and fT3l) . that this term is of the form: 



1 + e 



A/ja;j 



A/iX,; 



vo 



< i±£^E[2(a?-6?)vO], 



with 



j^(nF„[/f]., + Mf(i + i) 



Using again ((TT|), we end up with: 



E 



< 



1 + e 



(1 - S)v, - { nP„ + A/f ( i + i ) .T, ) > 2y 



dy. (18) 



We can again make use of Bernstein's inequality: 



If^ll £ 

oo ^ 



< exp(-S). 



Noticing that ff is non-negative everywhere, so that E [//] < E [/f] ||^, and using (O, we get: 

< exp(-0. 



(l-5)z;.>nP„[/f]+A/f + 
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Recalling ifTSj) . we get 

(1 - 5)v, - (^nP„ [/f 



Mf 



1 1 



3 5 



dy = 



exp -.Ti 



1 1 



dy 



3 S 



< cxp(-?/i) 1 2 + 



With that and (flTl). we are left with: 



E 



sup {(1 + e)x^(m) - pen(m)} V 



< — ^e-^'(^;, + M2(l + a;,)) + a;,e- 



^6I 



Replacing xt with its value, and overestimating Vi by nAf/ we obtain (under the condition that 2 In Mi + yi < 



E 



sup{(l + e)x^(m) - pcn(7«)} VO 



< C 



(- + ^) 



□ 



Remark: The logarithmic factor in the penalty (that would not be here if we took only the variance) comes 
from the multitude of models allowed by a hard-thresholding estimator. By selecting fewer models (for example 
the square matrices obtained by truncating the density operator) and using a random penalty, we can get rid of 
this term. However, crafting the penalty requires much more work and more powerful inequalities (Talagrand's). 
An interested reader may have a look at the section 3.4 of [13]. 



3.5. Applications with two bases 

We now give two bases that are reasonable when applying these theorems. As can be seen from ([2]), a good 
basis should approximate well any density operator (so that the bias term gets low fast when m is big), with 
dual vectors having a low variance. With the first of the two bases, we have this interesting phenomenon that 
we obtain a polynomial convergence rate under the mere physical hypothesis that the state has finite energy. 

3.5.1. Photon basis 

Here we shall take as our {ei)i^x a slight variation of the matrix entries of our density operator with respect to 
the Fock basis H]). 

More precisely, we worked in the previous subsections with real coefficients. To apply Theorems 13.41 and 13. 2^ 
we then need to parametrize p with real coefficients. The matrix entries are a priori complex. However, using 
the fact that p is self-adjoint, we may separate the real and imaginary parts. 

We use a double index for i and define the orthonormal basis, denoting by Ej^k the null matrix except for a 1 
in case (j, k) : 

( ^_{E,,k+Eu.,) if.7<fc 
ej,fe = < ^_{Ekj~E,,k) ifk<j . 
[ Ejj ifj = k 



14 



TITLE WILL BE SET BY THE PUBLISHER 



Then, using a tilde to distinguish it from the matrix entries, with pj^k = {p, ej^fe),we have 

-^{Pj.k + ipk^j) ifj<A: 
{ipj.P'ipk) = { -^{Pkj - iP],k) if J > A: 
Pj,3 if j = fc. 



The associated fj^k are well-known. They are a slight variation of the usual "pattern functions" (see Appendix 
IA.21 and ((37| therein), the behaviour of which may be found in [1]. Notably, we know that: 



TV 



j.k=Q 



(19) 



As the upper bounds on the supremum of fj^k may not be sharp, the best way to apply the above theorems 
(especially Theorem p.2|) ) would probably be to tabulate these maxima for the (j, k) we plan to use. 

The interest of this basis is that it is a priori adapted to the structure of our problem: if we have a bound on 
the energy (let's say it is lower than H + we get worst-case estimates on the convergence speed with the 
deterministic penalty: indeed, the energy of a state p may be written i + J^jJPj j^ so that 

2^Pi.j < j^- 



Moreover, by positivity of the operator. 



Plk+Pk.j < Pj.jPk,k- 
If we look at the models such that In — {(j, k) : j < N,k < N}, we can get: 

oo N 



d'ip,N) < Y.~pik-Y: 

j,k=0 j,k=0 



Plk 



N 



j>0 j-O 

< i-(i- 



N' 



< 



2H 



where we have used that the density operator has trace one. 
We substitute in ^ and get: 

H 

N 









2' 




E 




- p 







pcnjA^) 



Now, using the bounds on infinite norms (fT9|) . the penalty is less than: 



pen„(7V) 



C- 
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Optimizing in TV (TV = C{Hn)^^^°), we get 



P ' - P 



< Cij''/i°ln(i7)n-3/i0ln(n). 



(20) 



This estimate holds true for any state and is non-asymptotic. It is generally rather pessimistic, though. For 
many classical states, such as squeezed states or thermal states, pjj = Aexp{—B/n), the same calculation 
yields a rate for the square of the L^-distance as In(n)'^ for some f3. In such a case, the penalized estimator 
automatically converges at this latter rate. 



3.5.2. Wavelets 



Another try could be to use functions known for their good approximations properties. To this end we look at 
the Wigner function and write it in a wavelet basis. 

Recall that wavelets on M are an orthonormal basis such that all functions are scaled translations of a same 
function, the mother wavelet. In multiscale analysis, we use a countable basis t/jj^k '■ x ^ 2-^/^'!/;o,o(2-'a: + k), for 
j and k integers. Let Vi = {i^j^k ■ j < There is a (j), called father wavelet, such that the (pk{x) = 4>{x + k) 
for k £ 1 are a basis of the vector space generated by all the wavelets of larger or equal scale, that is Vq. We 
may choose them with compact support, or localized both in frequency and position. So they harvest local 
information and can fetch this whatever the regularity of the function to be approximated, as they exist at 
several scales. 

From a one-dimensional wavelet basis '■ x i— *■ 2^/^ipo^o{2^ x + fc), and zero mean, with a father wavelet 
0j,fc, also C^, we shall make a tensor product basis on L^(R^): let / = {j, k, e) be indices, with j integer (scale), 
k = {kx,ky) e Z'^ (position), and e € 0, 1, 2, 3. Let 



(l)j,k{x)(j)j^k{y) if e = 

(l)j,k{x)ipj^kiy) if e = 1 

^j,k{x)(j)j,k{y) if e = 2 

ipj,k{x)ipj,kiy) if e = 3 



We may then define a multiscale analysis from the one-dimensional one (written V, W): Vo = Vo CS" Vo and for 

all j G Z, Vj+i = Vj © Wj , so that Wj+i = VjWW] © W]WU] © Vj (g) Wj. 

For any j, Vj U ljfc>j is then an orthonormal basis of L^(IR^). We hereafter choose our models as subspaces 
spanned by finite subsets of the basis vectors for well-chosen j . 

It can be shown that: 

1 /•°° - 

7/(a;,(/)) = — / |u| 5'/(mcos0, usin(/))e"'"d-« 
47r 



fulfills this property: 

[7/,A7] = (*/,/). 

Noticing that 

"fi{x,(j)) = 2^7o,o,d(2-'x - /ca; 008 - fcy sin(^, 0), 
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we see that these functions have the same dilation properties as wavelets, and they are "translated" in a way 
that depends on 4>, through sinusoids. Their normalizations, though, explode with j; this derives from inverting 
the Radon transform being an ill-posed problem. 

We can now apply Theorem 13.41 Before doing so, though, we restrict ourselves to a finite subdomain of M^, 
which we denote V, and put the Wigner function to zero outside this domain, that we should choose big enough 
to ensure this does not cost too much. 

Then, M. is the set of all models characterized by 

m = {(ji, fc, 0) : V'k eV]\J {(j, k, e) : (j, k, e) S I'^ C {(j, fc, e) : e = 1; 2; 3, ji < j < Jq, Vk eV}] . 

To have good approximating properties, we choose 2^^ = n^/^ and 2^° = -(j^i^- The projection estimator within 
a model is then: 



with 

1 " 

ai = - y'7/(a;^,(/)0• 
n ^ — ' 

1=1 



Denoting — ||7o.o,e|loo; the translation of Theorem 13.41 gives (notice that applying l|3.2p would be awkward, 
as the variance of 7/ is like 2^ whereas its maximum is like 2^-'): 

Theorem 3.5. Let yi be such that J2i '2xp(— ?//) = cr < 00. For example yi = j . Let then: 

XI = 2(j+ln(B,))+2//. 
We choose an a E (0, 1) and the penalty (and restraining ourselves to the m such that I E m ^ xj < n): 



Then there is a constant C such that: 



E 



P- P 



(n) 



inf 



1 + - d\p,m) + 2pcn„(m) > V 



< 



^ + C-2^n_ 



(21) 



Proof. First it's easily checked that xj ^ 2 ^'n-{\\'^i\\ao) + yi- Second J2i '2xp(— j) = C J^j 2-' exp(— j) < 00 impHes 
that yi — j does indeed the work, as there are at most C2^ wavelets at scale j whose support meet V. 
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The last term is the variance of aj^^k,o, corresponding to the vectors that are in every model.: 



1. 
n 



2^1 kev 



< 



< 



il 

n 



fc,0 



2^1 kev 



- E 



2^1 kev 



d<t) 



x[0,7r] 



- y] / 7|i,fc,o(2;,0) / pp{x-k^ cos (l}-kysin(j3,(l3)dx — 



where we have used that for all x and k, Jq Ppix — coscjj — ky sin0, (j))^ is less than a constant about 1.086. 
Indeed, the translation of a Wigner function is still the Wigner function of a state, so that we may take k = Q. 
Then 



Pp{x ~ kxCO&(j) — kySm(j),(j)) — < sup 

i,x 



and the upper bound on this supremum is due to Cramer (10.18.19 in [9]). 



□ 



Remarks: As the variance of 7/ goes like 2^ the threshold might be seen as C2^/^ y ^ . This is the estimator 
studied in [4], for a general Radon transform (i.e. not a Wigner function). 

The role of the approximation speed is apparent in (|2T|) . Articles like [4] show that this strategy is asymptotically 
(quasi)-optimal for approximating a function in a Besov ball. However, this is no proof of the efficiency in our 
case, as the set of Wigner functions is not a Besov ball. There is still some work in approximation theory needed 
there. In particular, we do not know if a statement similar to ((20|) can be proven. 



Finally, notice that we may combine projection estimators: as the contrast function is the same for any basis 
we are working with, keeping the same penalizations, we could find an estimator that is almost the best among 
those built with the photon basis and those with the wavelet basis simultaneously (just add a ln(2) to a). In 
other words, we do not have to choose beforehand which basis we use. Moreover an estimator allowing for the 
two bases would satisfy (|20| 



3.6. Noisy observations 

The situation we have studied was very idealized: we did not take any noise into account. In practice, a number 
of photons fail to be detected. These losses may be quantified by one single coefficient 77 between (no detection) 
and 1 (ideal case). We suppose it to be known. 

There are several methods to recover the state from noisy observations. One consists in calculating the density 
matrix as if there was no noise, and then apply the Bernoulli transformation with factor 77"^. We can also 
use modified pattern functions [5]. Or we can approximate the Wigner function with a kernel estimator that 
performs both the inverse Radon transform and the deconvolution [3]. The former two methods fail if the 
observations are too noisy (77 < i), but the latter is asymptotically optimal for all 77 over wide classes of Wigner 
functions. 
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This noise can be seen as a convolution of the result {X, $) with a gaussian of variance depending on rj: 



or equivalently in terms of generating functions 



dx 



pl{x,(j))e'''''dx = e ^^^""^ j Pp{x,(j})t 



We can use the methods described above and then use the Bernoulli transform. For free, we may also use the 
modified pattern functions /J'j, knowing fj^^. Explicitly we see that the dual basis of the matrix entry pj^u 
becomes: 

flki^.<l^) = ^ j dre"^^" j dyf,.k{yA)e'''''- 

The reason why one needs 77 > ^ is for this Fourier transform to be well defined. 
And we can again apply Theorems 13.21 and 13.41 with the dual basis /J'^.. 

Obtaining results with high noise ?y < 5 is harder. We would need to introduce a cut-off h within the inverse 
Fourier transform (and therefore a bias). Using the same h as in [3] would ensure this bias b{p,h) is asymp- 
totically reasonable. We could then reuse Theorems 13.21 and 13.41 to have an "almost best" approximation of 
p + b{p, h) within a set of models, for finite samples. Careful examination would then be required to check the 
variance (or the penalties) go to as n and h{n) go to infinity. Moreover, we would need to translate conditions 
on the Wigner function into conditions on the density operator to see whether we can reproduce the asymptotic 
optimality results of Butucea et al. with model selection in the Fock basis (or any other basis chosen and studied 
a priori). 



4. Maximum likelihood estimator 



Projection estimators are not devoid of defects: the variance of empirical coefficients might be high, and the 
convergence therefore rather slow, the estimator is not a true density matrix... Especially, the trace is probably 
not one, though this could be fixed easily enough. We can diagonaHze the estimated density matrix, replace 
the negative eigenvalues with 0, and divide by the trace. 

Anyhow, there are other types of estimator that automatically yield density matrices. One such estimator is 
the maximum Hkelihood estimator, which selects the nearest point of the empirical probability measure in a 
given model for the Kullback-Leibler distance (which is not a true distance as it is not symmetric) . Recall that 
the Kullback-Leibler distance between two probability measures is: 

K{p,q) = j\nl^^p{x)dx. 

In other words, the maximum Hkelihood estimator is 

n 

arg min - In (X; , $/ ) 

where Q is any set of density operators (such that the minimum exists). This way, it is automatically a true 
density operator. A practical drawback is that calculating it is very power-consuming. 
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As 7n(-) — > — / ln(p.)c?pp , we have defined a minimum contrast estimator in the sense of section \3A\ Much hke 
for projection estimators, the Kullback distance thus estimated might be overly optimistic, and all the more 
when Q is big. Indeed, if Q is the set of all density operators, then there is no minimizer of the distance with 
the empirical distribution; however when we take only finite-dimensional models, such as 

Q{N) = {r e : rj,fc = for allj > TV orfc > TV} , (22) 

then the minimum is attained by compactness. Here the matrix entries Tj^k are taken in the Fock basis |(T]). 

We then have to define a penalty for choosing (almost) the best model. To do so, we make use of a (slightly 
simplified but sufficient for our needs) version of a theorem by Massart [16], but we need a few definitions before 
stating it. 

First we need a distance with which to express our results, and it is not the Kullback-Leibler, but the HeUinger 
distance. The HeUinger distance between two probability densities is defined in relation with the i^-distance 
of the square roots of these densities: 

h\p,q) = IJiV^-V^f- (23) 
This distance does not depend on the underlying measure. The following relations are well-known: 

l\\p-q\\l < h^ip,q)<^\\p-q\\i 

h\p,q) < ^K{p,q). (24) 

The penalty to be defined shall depend on the size of the model, that we have to estimate. The right tool is the 
metric entropy, and more precisely the metric entropy with bracketing of the model. 

Definition 4.1. Let Q a function class. Let Nb,2{S, Q) be the smallest p such that there are couples of functions 
[fl"^ /j^] for i from 1 to p that fulfill ||//^ — fi'\\2 — ^ every j, and for any f € G, there is an i G such 
that: 

/.^ < / < /f • 

Then Hb,2{S, Q) ~ \nNB,2{5, Q) is called the (5-bracketing entropy of Q 
Remarks: 

• Notice that the and Z/' need not be in ^. 

• The 2 in Hb,2 stands for L? distance. 

Looking closely at definition 14. H we see that the concept of entropy depends only on those of positivity and 
norms. We may then define a similar bracketing entropy for any space with a norm and a partial order, 
such as the ^-bracketing entropy of Q{N): we must find couples of Hermitian operators [t^ .rf] such that 
IItj^ — r/^ll^ < J and such that for any r € Q{N), there is an i such that t[ <t < . 

We are chiefiy interested in the entropy of square roots of density (denoted by Hb,2{S,V^)): 

V'/\N) = {^p-.PpEViN)}. 



Now the Theorem by Massart [16]: 
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Theorem 4.2. Let Xi, . . . , Xn be independent, identically distributed variables with unknown density s with 
respect to some measure ^. Let {Sm)meM be an at most countable collection of models, where for each m e A4, 
the elements of are assumed to be densities with respect to fi. We consider the corresponding collection of 
maximum likelihood estimators s,„ . Let pen : M — > M and consider the random variable m such that: 

P„ [- ln(s,-„)] + pen(77i) = inf P„ [- ln(s„J] + pcn(m). 
Let {xm)meM a collection of numbers such that 

For each m, we consider a function (j)m of IR+* , nondecreasing, and such that x ^ !tntiEl nonincreasing, and: 

Jo 

We then define each am as the one positive solution of 

Then there are absolute constants k and C such that if for all m £ M., 

pen(?Ti) > K |^CT,^j + — , 

then 

IE[/i'(s,s™)] < C (if (s, 5„) + pcn(m) + 
where, for every m e M, K{s, Sm) ~ inftes™ K{s, t). 

We notice that what is bounded in fine is the HelUnger distance and not the Kullback. Indeed our evaluation of 
the estimation error, which depends upon the size of the model (its bracketing entropy), dominates the Hellinger 
distance but maybe not the Kullback-Leibler distance. 

In our case, we have parametrized the models m by A^, through definition l|22p . 

To apply Theorem 14.21 we need to find suitable <j)m, and this calls for dominating the entropy integral. We 
reproduce here [1]. 

By ((24|) . it is sufficient to control Hb.i{S,'P{N)). Moreover, the linear extension of the morphism T sends a 
positive matrix to a positive function, and is contractive. So any covering of Q{N) by (5-brackets is sent upon 
a covering oiV{N) by ^-brackets, that is [pfjp^] = [p^L,p^u]. Thus 

HBAS,ViN)) < HBAS,QiN)), 

so that 

HbAS.vHn)) < CHBAS^QiN)). 

Moreover: 
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Lemma 4.3. 



where C is a constant not depending on 5 or N, and can be put to 1 + ln(5). 



Proof. Let {pj : j = 1, . . . ,c{S,N)} a maximal set of density matrices in Q{N) such that for all j ^ k, \\pj — 
Pk\\i > Define the brackets [pj,p^] as 

r 5 Tj S _ 

Then \\pj' — pf\\i = S. Moreover for any p in the ball -Bi(pj, 2^), as \\p — pj\\i < we have 

pf<P<pf 

and as {pj} was a maximal set, this set of brackets cover Q{N). 
SoHB.i{^QiN))<c{6,N). 

Notice that Bi{pj, ^) are disjoint and included in the shell i?i(0, 1 + ^) — -Bi(0, 1 — so that 



< ( — ) ' (25) 

concluding the demonstration. 



From this, we can obtain: 

Corollary 4.4. There is a constant C such that: 



HB,2{5.V^-{N))<CN^\n^. 



Writing 

cj)N{<y) 

and aN{n) the only a such that 



/ jHB,2ie,PHN))de 
Jo 



we get 



n 



□ 



TivH < \/-A^( 1 + WOVln- ) . (26) 
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Indeed 



UN 



(a) < CN 



CNi 



4 



'ln( ^ We 



2 dx 



In - 



2 dx 



In - 



In ■ 



< C7Va|l + A/ln^ 



where we have made use of, in each Une in turn, 

• Corollary [441 

• the change of variables x ~ ■\/ln(A^e^2)2, with ^ = —\fNxe~^ 

• integration by parts, with x seen as a primitive and xe~~ as a derivative 

2 2 

• the upper bound Ce~^ of '^dx for x positive when evaluating the first term. 



We are looking for an upper bound on (Tat, solution of the equation 




Vna^ = CNa \ 1 
We lower bound the second term by 0, and get 

and upper bound 

UN 




We may absorb the in the first multiplicative constant to find ((26l) . Of course we take only the positive part 
of the logarithm. This will always be the case hereafter. 

Applying Theorem 14.21 we get: 
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Theorem 4.5. Consider the collection of maximum likelihood estimators {pN)N&i, that is for any integer N, 

M-Hpm] = JP„hln(p^)] 
peQ(N) 

Let pen : N M-j- and consider a random variable N such that 

P„ [- Hpp- )] + pen(iV) = myP„ [- ln(pp„ )] + pen(iV)) 

Let (a;jv)jvgN a family of positive numbers such that 



JVGN 

Then there are absolute constants k and C such that if 



pen(7V) > ^(iX!(i + (ovln^)) + ^) 
n N n 



the 



nh^(pp,Pp^)] < C [ mt(E[X (p, Q{N))] + pcn(iV)) + 



^AfeN' ' - - ^ ' - ' ^ - ' 
with K{p, Q{N)) = inf^eQ(jv) K{pp,pr). 
Remarks: 

• When designing the penalty, what stands out in this theorem is the general form of the penalty. Now 
the constant k that can be expHcitly computed would be very pessimistic. The best thing to do is 
therefore to keep the general formula for the penalty and calibrate k using cross-validation, the slope 
heuristic [16] or any other appropriate method. 

• If we wanted an expHcit convergence rate for a given state, as for the photon basis in section [3.5. H we 
would first need to know how the Kullback-Leibler distance K{p, Q{N)) is decreasing with N. One thing 
that is obvious, however, is that if we add noise we convolve with the same function pp and Pa for all 
a in Q{N), so the Kullback-Leibler distance is decreasing with the noise, so convergence is faster when 
there is noise... The reason for this is that we are looking at convergence in Hellinger distance, that is 
a distance between the law of the result of the measurement Pp and Pa-. This does not tell us directly 
anything about what we are really interested in, that is the distance between p and a (as operators). 
Indeed we may bound the L'^ or L^ norm between elements of Q{N) by the Hellinger distance, times 
something depending on the sum of the L^ or L°° norms of the /J'^, . And these norms are going (very 
fast) to infinity when there is noise, so that low HeUinger distance gives no indication on the operator 
norms. 



5. Quantum calibration of a photocounter 



This section features a scheme to caHbrate an apparatus M measuring the number of photons in a beam with 
the help of a photocounter. 

The physical motivation is given in Appendix I A. 31 

The first subsection states the mathematical problem. In the two others are studied respectively projection 
estimators and maximum likelihood estimators. 
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5.1. Statistical problem 

The practical problem of calibration of a photocounter turns out to be mathematically speaking an entirely 
classical missing data problem. However, to the best of our knowledge, it has never been studied. We now 
describe this missing data problem. 

We are given samples (i, a;) in N x M from a probabiliy density of the form 



In this expression, the real numbers 6| satisfy J2m — ^- The ipk are the Fock basis functions given in Equation 
idl). For any k, the Pj^ are a probability measure, that is they are non-negative and J^'iZo = 1- 

We know the 6^, and we want to retrieve the , which we do not know. We write P ~ {Pi )i.k- 

To make clearer that this is a missing data problem, we give the following way to obtain this experiment. First 
we choose fc g N with probability given by 6^. We forget fc, which is the missing data. Our data consists in 
{i,x), with i having law given by P^ and x with law ^k{x). 

Notice that the experimentalist has some control on the but usual techniques will yield b^. proportional to 
. This means that the low k are probed faster. 

We propose below two types of estimators P for P. To get results on their efficiency, we must first find 
meaningful distance d{P, P). Since J2i = 1 for all k eN, distances Hke dl{P, Q) = J2i,kiPt ~Qi)^ are bound 
to yield infinite errors on our estimators. We then must weight them, using {ak)kefi of our choice. We shall use, 
depending on the estimator, either dl{P, Q) = J2^,k ^liPl'-Q'l:? with E 4 = 1. or di(P, Q) = J2^^,k (^k\Pt-Q^\, 
with J2k '^fc = 1- Then these distances are bounded by 2 on the set of all P such that {P/^}igN is a probability 
measure for every k. 

Varying the choice of Uk corresponds to putting the emphasis on different k, that is deciding which P^ we 
demand to know with the more precision. If we take the ak decreasing, it means physically that we are more 
interested in the behaviour of our photocounter for a low number of photons. This is usually the case for a 
physicist. A possible choice is to take ak or equal to b^. 

In the next subsection, we use projection estimators, and in the following, maximum Hkelihood estimators. 
5.2. Using projection estimators 

As in the tomography problem, the parameter space is contained in an infinite-dimensional vector space, and 
a natural type of estimators are projections of the empirical law on finite-dimensional subspaces. The problem 
we are left with is then again finding the best subspace. 

Concretely, we consider the distance d^{P,Q) — J^ik^ti^t ^ Qi)^ and write — akP^ . Similarly we shall 
write = o-kPi for our estimator. Then 




(27) 



dl{P,P) 



i.k 



and the law of our samples can be rewritten as 




(28) 
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We may then consider {{bf,/ak)ipk'i-i=i}k,i as a basis of our functions on N x M. We want to use the general 
constructions of section [H We first need a dual basis {gi.k}- Now, the dual basis of {ipl} as functions on K is 
well-known. Those are the "pattern functions" fk,k introduced in [6] (see ([371 ). From this, we deduce: 

9i,k{l,x) ^ 'jjfk,k{x)U=i- 

"k 

With these dual functions, we can define the minimum contrast function: 

\a=l / \i,k j 

where the {la,Xa) are our data, that is n independent samples with law p. 

Our models m & A4 consist in the subsets of N^. If {i, k) ^ m, then =0. In a model m, the estimator 
given by minimizing the contrast function is then 



lya^^^hll^ for (i,fc)€m. 



n ^ ak 

a— 1 



The penaHzed estimator is as always the projection estimator of the model rh such that: 



TO — arg min 7ti(-P''™-') +pen„(TO). 

nieM 

We also use the usual notation for the distance to a model: 

d2{P,m) = inf d2{P,Q). 

We then obtain from the general theorems of section [3l 

Theorem 5.1. Let P he a photocounter and (uk) and {bk) with J^k'^k ~ ^k^k ~ ^- ixi,k)(i,k)Gn^ such 
that J2i k s^^'''' = S < oo. We define a penalty as 



pen„(m) = ^ (1 + e) (ln(A/, 



k 



with 



Mi^k = ^(sup/A;^fe(a;) - inf 

"k 



Then the penalized estimator fulfills 



E 



2 + e 



dliP,P) 



< inf ( 1 + - )d^(P,m) + 2pen„(TO) + ^^-t^. 



Theorem 5.2. Let P be a photocounter and (ak) and (bk) with J^k'^k ^ St ^fe = 1- {yi,k){i,k)ef>i^ such 
that J2i m e"^' " = S < oo. Let then 

Xi^k = 2In + yi^fc. 
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For any S G (0, 1), with 
pen„(m) = 



pen, 



1 + e 



1-S 



n bj. 



1 1 

3 + ^]x,. 



«fc Wfk.kWr 



there is a constant C such that: 



E 



2 + e 



4iP, F) - 1 + - inf 4{P, m) + 2 penjm) V 



< 



where Mn is the set of models m for which {i, k) E m implies Xi,k < n. 



Remarks: 

• As with the estimation of states with tomography in section [Sj we choose with high efficiency the best 
subspace. It should be noticed that convergence is fast if the photocounter is good, and could be slower 
if it is bad. In the latter case, we know it is bad, though. Indeed, the dependence of the convergence 
rate on the photocounter P lies in the approximation properties of the models - subspaces - to, that is 
on how fast d\{P,m) decrease when to gets bigger. Now for an ideal photocounter, we need only the 
{i, i) to be in to. The penalty would be as low as possible when neglecting what happens to beams with 
more than a given number k of photons. For a worse photocounter, to have a good approximation of 
how a fc-photons beam is read, we might need many i, and the penalty would include all the pen*'*^. 

• The estimator depends only weakly on (ofc) (unlike the distance), which is good news as it is somewhat 
arbitrary. Indeed, the empirical P^ does not depend of this sequence at all, nor do the main terms in the 

threshold on P^ of both theorems. For Theorem 15. 1^ this main term is \J a^^(l + e) \n{Bi,k)Bi^k/\/n. 
Now Bi^k depends linearly on ak, so the only Uk left in this expression is in the logarithm which can 
be developed as lii{Bi,k/ak) + In(afc). In this way, we see that we only get another term in the penalty. 

ln(!l.9«.felL)/((l ^ '^)"-); and as gi^k 
is proportional to ak, the situation is the same. 

• The process by which we get our data includes a tomographer and the laws p{i,x) were given in the 
ideal case when there is no noise. If there is noise, as briefly sketched in section 13. 6^ these laws are 
different. However we may characterize the noise with a single < ?/ < 1. We then have for free the 
same theorems for r/ > ^: we only need to replace fk,k with /^j,. 

5.3. Maximum likelihood procedure 

In this case, our results are easier expressed with the distance 

di(p,p) = Y.^k\p^~p^ 

■i.k 

with E'^ = QkM^ and Efe ^fc = 1- We denote w, = J2^. E^- . Notice that 'Wi = '^- 

Recall that our data consists in n independent samples {la, Xa) with law p given by Eq. ((27|). 



For Theorem 15. 2[ the threshold is essentially a/8(1 + k 
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The main difficulty with applying here Theorem 14.21 lies in that the Kullback distance to the models is usually 
infinite (if we have = for all k for some i, then j5(i,R) = and this is generally not the case for p{i,M.)). 
The easiest way around is to keep independence and restrict attention to some set of i. 

Explicitly, we take an ordering on the possible results i of the photocounter (typically, if we expect that one 
result corresponds roughly to a given number of photons, we can order them in increasing order. The idea is 
that the results that interest us most should come first). We then choose, still beforehand, Imax G I^, and we 
restrict our attention to the first i E [0, Imax]- We just throw away the part of the data where the photocounter 
gave a result more than Imax- We are left with data size nj^^^, with law p/,„„^ on [0, Jma^:] x R: 

P|[0,/„a:r]xR 



This law is the probability measure associated to the apparatus P for which Pj' = — - — — li</^^^. 
The models mj^K we work with are indexed by if € N and / < Imax- They are given by the constraints: 



E- =0 if i > /, 



max 

E- ^0 ifi> Imax and k<K 

E^ = flfc for k<K 

i<I 



E'^ ^ iov k > K and i < Imax- (29) 



Any such element gives a probability measure on ([0, Imax] x K). Similarly to equation l|28p . the corresponding 
probability law reads p{l,x) = ^b^a^^ Efipkix^'^i^i- The fourth condition (|29| does not increase the 
complexity of the model and ensures that the Kullback distance remains finite. 

We can now use an empirical maximum likelihood procedure to select within each model an estimator. It 
minimizes on each mj^K the contrast function 

n 

ln{Q) = ^ -lng(?fc,Xfe). 

k=l 

where Q is an element of the model to/.k and q the associated probability law. 

We then use Theorem l4.2l to select the model of which we keep the estimator, through a penalization procedure. 
We obtain the following theorem. 

Theorem 5.3. Consider the collection of maximum likelihood estimators {Pi .k) i <i,„ax .Ken > defined as mini- 
mizers of 

ln{Pl,K) = inf "fn{P) 

Let pen : [0, Imax] x N ^ M 6e a penalty function and define (/, K) by 

ln{M^i^K)) + Pcn(/, K) = Jnf^^^7n(A.K) + pcn(/, K). 
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Let {xiji) he a family of numbers such that 



E 



S < oo. 



Then there are absolute constants k and C such that if 



pen(/,A') > K (/ + l)(/v+l 



K 



the 



E 



where K{pi^^^,mi^K) = infgem/ K K{j)i^^^,q) , intended as the Kullback distance on [Q,Imax\ x 
Remarks: 



• As with projection estimators, we can expect fairly quick approximation if the photocounter is good. 
Indeed, for K = Imax and the ideal photocounter, the distance K{pi^^^,mj^^^^j() = 0. 

• Like projection estimators, the maximum likelihood strategy can also be used with noise. If 77 > i, we 
get the same theorem changing fk,k in fj! Just notice that the infinite norm ||/fc,fe||oo is exploding. 

• As in section m an explicit computation of k would be over-pessimistic and it is best to estimate it with 
a data-driven procedure. 



Proof. First we rewrite and bound the distance di in a way that suits our purpose. We separate the entries 
corresponding to measurement results bigger than Imax, and we recall at the third line that J2ieNpi ~ ^k- 
Then 

d,ip,p)^j2\^'~^'\ 

i,k 

- E E^^ + E E 1^^-^ 



k i<I„ 



i>I„ 



& - 



^ E E^.'^ + E 2«^M ^ 

k \ \i<Imc 

E E p^E^f + Efsa.A E 



1 



i>I„ 



& - 



1 



- 1 -Bf 



2 ^» + E ^ E 



i>I„ 



i<I„ 



Ef - 



-E 
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Let us now work a little on the last term: 

1 



So that 



w,. 



J ■j^fk,k{x)l^=idpi^^^{l,x), 



Ef 



2fk.k{x)li=ldp{l,x). 



fk,k{x)li=id{pi^^^ ~p){l,x) 



< ^\\fk,k\L I u=id\pi^.^-m,x). 



Summing over i, we get: 



E 



i<i„ 



< ^WAklL I d\pj_-p\{i,^)- 



We may then bound the distance between the POVM we calibrate and our estimator by 



ken 



di{P,P) = 2 ^» + E (^II/m-IIoo / -pK''^;) 



Finishing the proof of our theorem amounts to controlling / d\pi^^^ — p\{l,x). We first apply Theorem [47 
(assuming that our penalty is big enough, which we check below). We get: 



E 



^ < (^^^ inf^^^i^(pw,m7,if) + pen(/,iC) 



We then use the bound (|24| of the square of the L^-distance in the Hellinger distance, and finish with Jensen, 
using the concavity of both the function x i—* {C A x) and the square root. 



J>/marL- k&i \ \ k J 

< E + f f2«fe A (c^ IIA.felL Jh^ (pw -Pi-.k)]] 



^ E ^. + Ef2«'^^^teii/"iic 



K 



inf Kivj , mi k) + pcn(/, K) H 
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The only thing we still have to check is our penalty. We must dominate Hb.2{5^'P^^{I , M)) where 



With the same reasoning as in sectionlU it is sufRcient to dominate HB,i{S^,mj^ji). We then mimic lemma l43l 
All the elements of uii^k are on the L^-sphere of radius X]fc<x '^k of a vector space of dimension (K + + 1). 
We can then associate a maximal collection of brackets to a maximal collection {Pj) of P £ mi^K separated by 
d^/{2{K+l){I+l)). The balls Bi{Mj, are disjoint and in the shell Bi{0,J2k<K ^fe + (k+i)(i+i) ) " 

-^1(07 Sfe<A' ^fe ^ (K+ri(7+i) )• with equation (|25l) . we obtain 

i^B.l(<5^ m,,A) < CiK + + 1) In (^Ii^±^l±ll 
Imitating the calculation in the proof of corollary 14. 4[ we find that the solution ai_K of the equation 



V^^'^Ik - I ^HbA5.V^^{I,K)) 



admits this upper bound: 



cyi,K < '-^ ^(1 + v/lnn/„„J 

V '^J^ax 

We may absorb the latter 1 in the constant, as long as nj^^^ > 2... 
This ends the proof. 

□ 
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Appendix A. Background in quantum mechanics 

Subsection lA.ll gives parallel developments of classical statistics and quantum statistics, so that any quantum 
notion is Hnked with a classical equivalent. 

Subsection IA.2I describes both the experimental setup of quantum homodyne tomography and some basic 
mathematics playing a role in it. More precisely, it highlights several different representations of the state to 
be recovered (our unknown) and the links between them. 

Subsection IA. 31 is background for section [5l Notably, it explains where the formulas such as ((28]) come from. 
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A.l. Statistics: classical and quantum 

We have here three different parts. The aim is to highhght the equivalences in classical and quantum formalism. 
The first part lies then upon the classical world, the second part recast this construction as a special case of 
what will be our quantum formalism, and the third part describes these quantum statistics. Bold numbers 
refer to the same number in the other sections. They might be repeated inside a section if the same object is 
introduced under different forms. 

In this short introduction to the subject, we shall restrict ourselves more or less to describing what physical 
measurements can be done and how they can be encoded mathematically. In other words, we characterize what 
information can be retrieved from a system. 

A. 1.1. Classical 

In the classical setting of statistics, we are working with probability measures p { 1 } on a probability space 
{X,A) { 2 }. For comparison, we recall that probability measures are normalized { 3 } real { 4 } non-negative 
{ 5 } measures. Similarly measures are elements of M{X,A) { 6 }, the dual of L°°{X,A) { 7 }. 

Notice that the probability measures form a convex set, the extremal points of which are the Dirac measures 
{ 8 } on a; for x G {X,A). They may then be described by a; { 9 }. If we want to draw on the analogy with 
physics {X,A) may be viewed as a phase space, and the x would be the pure states. A general probability 
measure would describe a mixed state. These are systems that have a probability to be in this or that pure 
state. Any mixed state (probability measure) can be decomposed in a unique way over pure states (Dirac). 

A statistical model { 10 } consists in a set of probability measures pe on a probability space {X, A) indexed by 
a parameter 0, for6'e9{ 11 } the parameter space. A statistical problem consists in determining as precisely 
as possible, with a meaning depending on the instance, a function of 9. 

Now we must gain access at information on these 9 in some way. What we have access at are random variables. 

The aforementioned space L°°{X,A) is the space of real bounded random variables / { 12 }. By analogy with 
the quantum case, we call these / observables. They correspond to the set of physical measurements that can 
be carried out on the system, to what can be "observed". 

"Measuring" an observable / yields a result f{x) { 13 }, with law: 



where B is the borelian cr-algebra of M. Notice that this result is not random for a pure state. 

Notice also that the way we could see the probability measures p as elements of the dual of L°°(X,A) was by 
writing p(/) = f{x)dp{x) { 15 }. 

The most general type of statistic or estimator we can extract from data, including random strategies, is 
obtained by associating to each x a probability measure on an auxiliary space (Xa, A, a) { 16 } and draw a final 
result according to this probability measure. This is equivalent (at the price of changing the auxiliary space) to 
measuring a function / { 17 } on a space {X ® Xa,A^ Aa) { 18 } according to a probability measure pe <8) s 
{ 19 } with s independent of 9. 

If we write ((30|) in this case, we get 




for B e S { 14 } 



(30) 




for B eB. 
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If we integrate out Xa, this yields 



¥g[feB]= f ,fB{x)dpe{x) 



for B e S { 20 } 



Jx 



where 



• /k = 1 { 21 } 

• < /s < 1 { 22 } 

• For countable disjoint Bi, fs. 



fu.B, { 23 }. 



As a remark, the result f{x) is essentially a label. We could write the same formula for functions with values 
in other measure spaces (3^, B) than M. Just let B be the cr-algebra on this space. In this way, we retrieve in 
particular estimators in R''. 

Another very important remark is that if we have access to two statistics / and g, we have access to both 
{ 24 }. Indeed suppose that / was taking its values in {y,B) and g in {Z,C). Then take a new statistic with 
values in the product space {y ^ Z,B ^C), characterized by hs^c = fs * 9c as real functions on {X, A). We 
see that the three conditions are satisfied, and that the marginals of h are / and g. 

A. 1.2. From classical to quantum 

The above description was already somewhat non-conventional, with the parallel with quantum formalism in 
mind. In this subsection, we take one further step, by setting classical probability as a special case of what will 
be our quantum probability theory. 

To have something easy to understand, we start from a finite probability space {X,A) = {1, . . . ,d} { 2 }. We 
associate to it the Hilbert space of complex valued functions on this space, that is Ti = { 2 }. We are here 
endowed with a distinguished orthonormal basis {\ei)}i<i<d with je^) the function whose value is one on i and 
zero elsewhere. 

Notice by the way the notation \ip): this is a physicist's notation for vectors, elements of H. They call this a 
"ket". The associated Hnear form, that is, the adjoint of the vector, is called a "bra" and denoted {ip\. Thus 
(01V') is the scalar product of 10) and lip) (a "bracket"). 

Now to the probability measure p = {pi, ■ ■ ■ ,Pd) { 1 } on {1, . . . , d}, we associate the matrix p { 1 } diagonal 
in our special orthonormal basis { 6 }, with diagonal entries (pi, . . . ,Pd)- As this is a diagonal matrix in an 
orthonormal basis, with non-negative elements, this is a self-adjoint { 4 } non-negative { 5 } matrix. Moreover, 
as J2iPi = 1 { }) it has trace 1 { 3 }. 

We see that the extremal points of our set are of matrices are the orthogonal projectors on the lines spanned by 
our special eigenvectors, that is |ei)(ei| { 8 }. They correspond to the Dirac measures on i. We may represent 
any of these pure states by the eigenvector je^) { 9 }. We may also rewrite p = J2iPi\^i){^i\- 

A statistical model { 10 } consists in a set of non-negative matrices pe with trace 1, on a Hilbert space H, 
diagonal in the {|ei)}i basis, indexed by a parameter 6, for 6 £ Q { 11 } the parameter space. A statistical 
problem consists in determining as precisely as possible, with a meaning depending on the instance, a function 



As we have done for probability measures, we identify / £ L°°{{1, . . . , d}) { 12,7 } with the diagonal matrix 
O £ M{C'^) { 12,7 } whose diagonal elements are the Oi^i = f{i). This is still the dual of the set of matrices 
diagonal on our special basis. We view the action of p by taking the trace of the product with p. That is 
p{f) = tr {pO) { 15 }. One can see that we have only rewritten the classical formula for the expectation. 



of 0. 
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Equivalently, measuring an observable O yields as a result an eigenvalue of O { 13 }. The law of the result is 
given by: 



Pp [O e B] = tr [pPo,B) for B e S { 14 } 

where Po,b is the projection upon the space spanned by the eigenspaces of O corresponding to those eigenvalues 
A of O such that X G B. In other words, in our case, O = J2i /(*)|ei)(ei|. Then Pq.b = J2i\f{i)£B This 
Po,B is playing the role of lfi^x)&B in the classical setting. And we take note that tr {pPo,b) = Si|/(i)eB 
we should obtain from the classical formula. 

We can encode in the same framework the general strategies for estimators, provided that Xa is also finite { 16 }. 
The auxiliary space is then identified to Ha = C'*" . We have matrices P0®a { 19 }, with a independent of 9. We 
are allowed to use as observable O { 17 } any matrix diagonal in the same basis as these pe ® a. The procedure 
equivalent to the partial integration on Xa is then taking partial trace on Tia in Yg[0 G B] = tr {{pe ® <j)Po,b)- 
And this yields tr {pgM{B)) { 20 } with 

• M{R) = 1-H { 21 } 

• M(B) is non-negative and diagonal in the {|ei)} basis { 22 } 

• For countable disjoint Bi, J2i M{Bi) = M(1J. S;) { 23 }. 

Here again, we see that if we have access to Oi and O2 characterized by the families Mi[B) and Af2(C), we 
have access to both { 24 }. Our new measurement would be characterized by N{B ® C) — Mi{B)M2(C) as 
multiplication of matrices. Notice that this set of matrices still satisfies the three above conditions. Especially, 
the fact that they are still non-negative stems from that they are diagonal in the same eigenbasis. 

Going from classical to quantum now means throwing away our special eigenbasis {|ei)}. The immediate 
consequence will be that we shall deal with objects that do not commute. And of course, we did not restrain to 
finite probability spaces in the classical case. Likewise, we do not restrain to finite-dimensional Hilbert spaces in 
the quantum case. We shall therefore deal with operators rather than matrices. Keeping the finite-dimensional 
example firmly in mind should be a guide to the intuition of those less proficient in operator theory. 

A. 1.3. Quantum 

A quantum system is described by a density operator p { 1 } over a Hilbert space TL {2 }, that is: 
Definition A.l. : Density operator 

A density operator, usually denoted by p, is a trace-class linear operator on a (complex, separable) Hilbert space 
v. that satisfies: 

• p is self-adjoint { 4 }. 

• p is non-negative (notice that this impHes self-adjointness) { 5 }. 

• tr p = 1 { 3 }. 

If H. is finite-dimensional, those are just the (self-adjoint) non-negative matrices with trace 1. 
We denote by Siji) the set of density operators on Ti. 

Density operators are a convex set, too. The extremal points are called "pure states". They are the orthogonal 
projectors on 1-dimensional spaces { 8 }. Thus we can represent them by a norm 1 element of denoted by 
\ip) { 9 }. The corresponding density matrix is then p = Notice that it would be more precise to speak 

of as an element of the projective space VTi, but we conform here to the usage of physicists. Notice also that 
there are infinitely many pure states even in the finite-dimensional case, unlike in the classical framework. Let 
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US finally signal that the decomposition of a mixed state on pure states is not unique. It is essentially unique if 
we further impose that the pure states of the decomposition are all orthogonal, though. 

A quantum statistical model { 10 } consists in a set of density operators pe on a Hilbert space H indexed by a 
parameter 9, for G & { 11 } the parameter space. A statistical problem consists in determining as precisely 
as possible, with a meaning depending on the instance, a function of 6. 

Now the role of random variables is played by observables. Those are the elements O { 12 } of Sso(H) { 7 }, the 
bounded self-adjoint operators upon H. If we are dealing with finite-dimensional H, those are the self-adjoint 
matrices. 

As a remark, the dual of Bsai'H) is the set of self-adjoint trace-class operators, which p is in. This duality is 
given by the formula of the expectation of measuring O on p, also called Bom's rule: 

Ep[0]=tr(pO) {15} (31) 

When measuring O, the result is an element of the spectrum of O { 13 }, that is in the finite-dimensional 
picture, an eigenvalue of O. The law of the result when measuring O on p is: 

Pp [O e S] = tr {pPo.B) for B e S { 14 } (32) 

where Pq.b is coming from the spectral measure of O. This is an object associated to self-adjoint operators 
through the spectral theorem, whose main property is that the expectation of the law above is given by the Born's 
rule for any density operator p. We only give the derivation for finite-dimensional Ti. Then, as O is self-adjoint, 
we can diagonalize it in an orthonormal basis, and write O = Ai|V'i)(?/'i|. Then Po,b = '}2i\Xi£B 
We see that in this case the law of the measurement is coherent with the expectation given by Born's rule ((311) . 

Generally {Po,b}b is a projector valued measure, the definition of which we give below. To each projector 
valued measure corresponds an observable, and to each observable corresponds a projector valued measure. We 
may then consider that this concept is also a definition of an observable. 

Definition A. 2. : Projector valued measure { 12 } 

A projector operator valued measure {P{B)}BeB is a set of operators on H. such that: 

• P{B) is an orthogonal projector. 

• P(M) = 1-H. 

• For disjoint countable S;, P{Bi) = P{[j^ B,). 

Notice that these are the axioms of a probability measure, except that we do not deal with real numbers but 
with projection operators. 

Combining this definition with the definition of a density operator, we can check that formula l(32|) yields a 
true probability measure. Indeed, as both p and Po,b are non-negative, the probability of any event is non- 
negative. With the countable additivity property of projector valued measure and linearity of product and 
trace, we get the countable additivity of a probability measure. Finally, the probability of the universe is 
tr (pPo.r) = tr {p\n) = 1- 

Remark: - even for a pure state, the result of the measurement is random, unless the pure state is an eigenvector 
of O. 

Now what is the most general estimation strategy, or measurement? The right analogy is that of the auxiliary 
space. We measure observables O { 17 } on a Hilbert space Ti, ® Ha { 18 } under the density operator pg® a 
{ 19 }, with a independent of 9. Now we may take partial trace in l(32|) along Tia, and we obtain equivalence 
of this scheme with measuring a positive operator valued measure (POVM). 
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Definition A.3. : Measurement (POVM) { 17 } 

A measurement M on a quantum system, taking values x in a measurable space {X,A) is specified by a positive 
operator valued probability measure or POVM for short, that is a collection of self-adjoint matrices M{A) : A £ A 
such that: 

• M{X) = 1, the identity matrix { 21 } 

• Each M{A) is non-negative { 22 } 

• For disjoint countable A,, J2i^^{^t) = A/(U A) { 23 }. 
The M{A) are called the POVM elements. 

The law of measuring M on p is given by 



With the same reasoning as for projector valued measure (which are a special case of these POVMs), this is a 
genuine probability measure. 

A special case of POVM is that of a POVM dominated by a-finite measure v on {X,A), that is 



where m(x) is positive for all x and m{x)dv{x) = I-h- The POVM associated to homodyne tomography is 
dominated by the Lebesgue measure. 

The very important difference with the classical world is that if we can have access to A/i or M2, in general, we 
cannot have access to both simultaneously { 24 }. We cannot copy what we have done in the former paragraph, 
since Mi{A)M2{B) + M2{B)Mi{A) might not be non-negative if Mi{A) and M2{B) do not commute. More 
generally, there is usually no way to create a new POVM N with values in {X ^ y,A^ B) such that the 
marginals are Mi and M2. Notably, two observables that do not commute can never be measured simultaneously. 
As an example, consider that A'h and M2 are two projector valued measures on C^, each with values in 
{0,1}, corresponding to observables diagonal in different bases {eo,ei} and {/o,/i}. Then 7V(0,0) should 
be proportional both to |eo)(eo| and |/o)(/o|- So that it is null. Same remark for the other N(i,j). Thus 
N{{0, = 7^ 1. So that it is null. 

The truly quantum feature of quantum statistics lies in that we should decide which measurement is to be carried 
out. Once we have chosen our measurement, we are left through ((33l) with a classical statistical experiment. 
This is the case in this article. 

As a last remark on the subject, we could have developed a sHghtly more general formaHsm, based on C*- 
algebras, that would have been parallel to Le Cam formulation of statistics. In practical applications, the 
formalism above is usually sufficient. 

A. 2. Quantum homodyne tomography 

The system we work with is the harmonic oscillator. Both in classical or quantum mechanics, the harmonic oscil- 
lator is a basic and pervading system. It describes, notably, a particle on a line, or a mode of the electromagnetic 
field (that is monochromatic light), as in our case. 

The state of a quantum harmonic oscillator is described by an operator on L?{R) (this is the Hilbert space 
{ 1 }). There are two important observables corresponding to the canonical coordinates of the particle. If we 



Pp [Oe A] =tr [pM{A)) 



iov A^A {20 }. 



(33) 




(34) 
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know the expectation of measuring on a state p any operator in the algebra they generate, then we know p. 
Those observables are P,the magnetic field, and Q, the electric field. They satisfy the (canonical) commutation 
relations: 



[Q,P] =QP PQ 
= il. 



They are realized as: 



{Pip2){x) = -I . (35) 



As they do not commute, they cannot be measured simultaneously. However, any linear combination can 
theoretically be measured. These = sin(0)Q + cos(0)P are called quadratures. 

Using an experimental setup proposed in [18], each of these quadratures could be experimentally measured on 
a laser beam [17]. The technique is called quantum homodyne tomography. 

The optical set-up sketched in figure [2] consists of an additional laser of high intensity |z| ^ 1 called the local 
oscillator, a beam splitter through which the cavity pulse prepared in state p is mixed with the laser, and two 
photodetectors each measuring one of the two beams and producing currents /i,2 proportional to the number 
of photons. An electronic device produces the result of the measurement by taking the difference of the two 
currents and rescahng it by the intensity \z\. A simple quantum optics computation in [14] shows that if the 




Figure 2. Quantum Homodyne Tomography measurement set-up 
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relative phase between the laser and the cavity pulse is chosen to be then (/i — /2)/|-z| has density Pp{x\(l>) 
corresponding to measuring . 

Knowledge of Pp{x\(j)), the law of the result of the measurement on p, for all (j), is enough to reconstruct the 
state p. As we have seen, the experimentalist may choose (j) when measuring. We assume that the measurement 
carried out on each of the n systems in state p is the following: first choose uniformly at random, then measure 
X^. We get a random variable Y = (X, (p) with values in M x [0, tt) whose density with respect to the Lebesgue 
measure is Pp{x,(f) = ^pp[x\cj)). 

Now we make explicit the links between p, Pp{x, (f) and the Wigner function Wp. First we write p in a particular 
basis, physically very meaningful, the Fock basis, already given in Sec. [2j 

M^)^H,ix)e--"/', 

where Hk is the k-th Hermite polynomial, normalized so that the L^-norm of ipk is 1. The projector on ipk is 
the pure state with precisely k photons. We also denote this state by the ket \k). 

The matrix entries of Pp in this basis are pj^k = {"^j-, pi^k) ■ We can then derive from ((3T1) and (|35|) the formula 
we gave in Sec. [21 

T:5(L=^(R)) — > Li(Kx[0,7r]) 

P ^ lpp:{x,<j>)^f2 p,.kiJj{x)M^)e-'^^-'^'^ ■ (36) 



The mapping T associating Pp to p is invertible, so we may hope to find p from the independent identically 
distributed results Yi, F2, . . . , y„ of the measurements of the n systems in state p. This implies notably that Pp 
is another representation of the state. 

More explicitly, there are pattern functions fj.k [6] against which to integrate pp to find any matrix entry of p 
in the Fock basis, that is: 

dx r ^p^(^,0)/^.,(a;)e^(^-'=)^ 




These fj^k are bounded real functions. That inverting the Radon transform is an ill-posed problem can be seen 
in the behaviour of fj,k when j and k go to infinity. Several formulas were found for these functions [15], among 
which: 

fj.k{x) = ^{xA^)M^)) (37) 

for k > j, where Xj and (pk are respectively the square- integrable and the unbounded solutions of the Schrodinger 
equation: 

Another one, maybe more practical when it comes to theoretical calculations, or when we add noise (see section 
Km is: 



-+2irx k-j rk-j I 2 



where the L'j are the Laguerre polynomials, that is the orthogonal polynomials with respect to the measure 
e-'^x'^ on K+. 
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Let's now have a look at the Wigner function. This is a real function of two variables, with integral 1, but that 
may be negative in places. It can be interpreted as a generalized joint probability density of the electric and 
magnetic fields q and p. As both cannot be measured simultaneously, the negative patches are not nonsense. 
On the other hand, any projection on a line of the Wigner function must be a true probability density, as it 
is the law of X^, which is an observable. In fact, the Wigner function may be seen as the probability density 
on resulting from l(33|) when measuring on p a "POVM" whose elements are not non-negative, but whose 
marginals on each line R are the X^. 

As we have already said in the introduction, pp is the Radon transform of the Wigner function. The Wigner 
function can be defined by its Fourier transform. This definition tells how to find the Wigner function W of the 
state from its density matrix p: 

T^Wiu, v) = tr (pe-^"'^-*"^). (38) 
On the other hand, the generating function of pp(-|0) is 

E[e''*^*] = tr(pe**^*). 

In other words, T-iW [tcQS(\),tsm(\)) = T[p p{- , (t>)]{t) . These relations are known to imply that Pp = Il{W) [8] 
where R is the Radon transform. Explicitly: 

/oo 
W{x COS <j> + y sin (f>, x sin (j) — y cos 4>)dy. 
-oo 

The Radon transform is illustrated by Fig. [U given in Sec. [21 

Finding the Wigner function from the data means then inverting the Radon transform, hence the name of 
tomography: that is the same mathematical problem as with the brain imagery technique called Positron 
Emission Tomography. 



A. 3. Physical origin of the photocounter cahbration problem 

An experiment usually ends with a measurement. We need, however, an apparatus to measure. And we first 
have to know what is the meaning of the result the apparatus is giving us: it is not at all obvious a priori that 
if our new thermometer says "31° C", the temperature cannot be "32° C". That is why we must calibrate our 
measurement apparatus. In quantum mechanics, this means associating with each result i of our measurement 
the positive operator such that P is the POVM (see definition IA.3P corresponding to our measurement. 

In [7], a general calibration procedure was intoduced. The procedure relies on comparing with an already 
calibrated apparatus, using entangled states. Let us describe this more precisely in the special case of the 
photocounter. 

A photocounter is an apparatus that aims at counting the photons in a beam. The ideal detector D has therefore 
POVM elements given by D{i) = in the Fock basis. Recall we use the physicists' notation, where |-) is a 
vector and (-1 is the associated linear form. Moreover |i) is the vector corresponding to the pure state with i 
photons, that is the function Vi on L^(M), that we had defined in 

Models of the noise (non-unit efficiency and dark current) leave the POVM diagonal in this basis. Thus, we 
are only interested in the diagonal elements of Pi in the Fock basis. To obtain those we send a twin beam 
state, one of the beams in the photocounter, the other in a homodyne tomographer. We get a result i from the 
photo-counter, and x from the tomographer (figure [Sj as we are only interested in the diagonal elements, we 
shall see that we do not need the phase (j), as long as the experimentalist chooses it randomly). We then have 
to process these outcomes {i,x) to find P. 
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Figure 3. Experimental set-up to determine the POVM associated to an unknown photo- 
counter P. We use it to measure a known bipartite state |s), jointly with a tomographer T. 
The photocounter gives a result i and the tomographer a result x. From these samples, we 
construct an estimator {Pi} of the self-adjoint operators associated to the results {i} by the 
photocounter P. 



Mathematically, the twin beam is a system in a state |s) = J2T=o^>^\^) This notation (where we may 
choose the bk non-negative) means that the underlying Hilbert space is L^(R) (g) i^(M), and that p is the pure 
state that projects on the line spanned by this vector. Here again, \k) is the vector corresponding to the pure 
state with k photons. Finally J2k 1> that the vector state \s) is normalized and the density operator is 
p=\s){s\. 

Now, what is the law p{i, x) of the samples we get? By l(36)) we see that the POVM associated to the tomographer 
is dominated by the Lebesgue measure on M x [0, tt), as in (|34l) . That is {j\tx,4,\k) = ijj{x)4'k{x)e~^''^~''^'^ , where 
we have denoted t^,^ the self-adjoint operator associated to the result {x, cf)) for the POVM of the tomographer. 
If we forget about after having chosen it randomly, we then get {j\tx\k) = tpk{x)'^lj=k- We have now all the 
ingredients for calculating our law, given the notation {k\Mi\k) = . 

p{i,x) =tT(p{Pi®tx)) 
- {s\{P^®tx)\s) 

= Kbk,{{ki\® {ki\){P,®tx){\k2) ®\k2)) 

ki .k2 

= KbkAki\P^\k2){ki\tx\k2) 

ki.k2 

oo 

k=0 

(As a remark, the fourth Hne shows that the use of the phase would be to retrieve the non-diagonal elements, 
in which we are not interested.) 

We have thus recovered l(27|). and explained how we got the data with which we want to estimate the M™. 



40 TITLE WILL BE SET BY THE PUBLISHER 

References 

[1] L.M. Artiles, R. Gill, and M. Guta. An invitation to quantum tomography. J. Royal Statist. Soc. B (Methodological), 67:109- 
134, 2005. 

[2] O. E. Barndorff-Nielsen, Gill, R., and Jupp, P. E. On quantum statistical inference (with discussion). J. R. Statist. Soc. B, 
65:775-816, 2003. 

[3] C. Butucea, M. Gul^a, and L. Artiles. Minimax and adaptive estimation of the wigner function in quantum homodyne tomog- 
raphy with noisy data, to appear in Annals of Statistics, 2005. 

[4] L. Cavalier and J.-Y. Koo. Poisson intensity estimation for tomographic data using a wavelet shrinkage approach. IEEE Trans, 
on Information Theory, 48:2794-2802, 2002. 

[5] G. M. D'Ariano, Leonhardt, U., and Paul, H. Homodyne detection of the density matrix of the radiation field. Phys. Rev. A, 
52:R1801-R1804, 1995. 

[6] G. M. D'Ariano, Macchiavello, C., and Paris, M. G. A. Detection of the density matrix through optical homodyne tomography 

without filtered back projection. Phys. Rev. A, 50:4298-4302, 1994. 
[7] Giacomo Mauro D'Ariano, Lorenzo Maccone, and Paoloplacido Lo Presti. Quantum calibration of measuring apparatuses. 

Phys. Rev. Lett., 93:2004, 2004. 
[8] S. R. Deans. The Radon transform and some of its applications. John Wiley & Sons, New York, 1983. 
[9] Erdfelyi. Higher Transcendental Functions, volume 2. McGraw-Hill, 1953. 
[10] R. Gill. Quantum Asymptotics, volume 36 of Lecture Notes- Monograph Series, pages 255-285. IMS, 2001. 
[11] C. W. Helstrom. Quantum Detection and Estimation Theory. Academic Press, New York, 1976. 
Jl2] A. S. Holevo. Probabilistic and Statistical Aspects of Quantum Theory. North-Holland, 1982. 

]13] J. Kahn. Selection de modeles en tomographic quantique. Master's thesis, Ecole Normale Superieure, Universite Paris-Sud, 
2004. 

]14] U. Leonhardt. Measuring the Quantum State of Light. Cambridge University Press, 1997. 

Jl5] U. Leonhardt, Paul, H., and D'Ariano, G. M. Tomographic reconstruction of the density matrix via pattern functions. Phys. 
Rev. A, 52:4899-4907, 1995. 

Jl6] P. Massart. Concentration Inequalities and Model Selection. Lecture Notes in Mathematics. Springer- Verlag, 2006. Ecole d'ete 

de Probabilite de Saint-Flour 2003. 
Jl7] D. T. Smithey, Beck, M., Raymer, M. G., and Faridani, A. Measurement of the Wigner distribution and the density matrix 

of a light mode using optical homodyne tomography: Application to squeezed states and the vacuum. Phys. Rev. Lett., 

70:1244-1247, 1993. 

]18] K. Vogel and H. Risken. Determination of quasiprobability distributions in terms of probability distributions for the rotated 
quadrature phase. Phys. Rev. A, 40:2847-2849, 1989. 



